Extended-Alphabet Finite-Context Models
نویسندگان
چکیده
The Normalized Relative Compression (NRC) is a recent dissimilarity measure, related to the Kolmogorov Complexity. It has been successfully used in different applications, like DNA sequences, images or even ECG (electrocardiographic) signal. It uses a compressor that compresses a target string using exclusively the information contained in a reference string. One possible approach is to use finite-context models (FCMs) to represent the strings. A finite-context model calculates the probability distribution of the next symbol, given the previous k symbols. In this paper, we introduce a generalization of the FCMs, called extended-alphabet finite-context models (xaFCM), that calculates the probability of occurrence of the next d symbols, given the previous k symbols. We perform experiments on two different sample applications using the xaFCMs and the NRC measure: ECG biometric identification, using a publicly available database; estimation of the similarity between DNA sequences of two different, but related, species – chromosome by chromosome. In both applications, we compare the results against those obtained by the FCMs. The results show that the xaFCMs use less memory and computational time to achieve the same or, in some cases, even more accurate results. 1 ar X iv :1 70 9. 07 34 6v 2 [ cs .I T ] 1 5 M ar 2 01 8
منابع مشابه
Stochastic chains with memory of variable length
Stochastic chains with memory of variable length constitute an interesting family of stochastic chains of infinite order on a finite alphabet. The idea is that for each past, only a finite suffix of the past, called context, is enough to predict the next symbol. These models were first introduced in the information theory literature by Rissanen (1983) as a universal tool to perform data compres...
متن کاملComplexity of Problems for Commutative Grammars
We consider Parikh images of languages accepted by non-deterministic finite automata and context-free grammars; in other words, we treat the languages in a commutative way — we do not care about the order of letters in the accepted word, but rather how many times each one of them appears. In most cases we assume that the alphabet is of fixed size. We show tight complexity bounds for problems li...
متن کاملAutomata and Logics for Words and Trees over an Infinite Alphabet
In a data word or a data tree each position carries a label from a finite alphabet and a data value from some infinite domain. These models have been considered in the realm of semistructured data, timed automata and extended temporal logics. This paper survey several know results on automata and logics manipulating data words and data trees, the focus being on their relative expressive power a...
متن کاملOn Commutative Context-Free Languages
Let C = {a,, a2, . . . . a,} be an alphabet and let LcZ* be the commutative image of FP* where F and P are finite subsets of Z*. If, for any permutation c of { 1,2, . . . . n}, L n a&) a%, is context-free, then L is context-free. This theorem provides a solution to the Fliess conjecture in a restricted case. If the result could be extended to finite unions of the FP* above, the Fliess conjectur...
متن کاملZero Temperature Limits of Gibbs-Equilibrium States for Countable Alphabet Subshifts of Finite Type
Let A be a subshift of finite type on a countably infinite alphabet, and suppose that the function f : A → IR has summable variations. Further assumptions on f ensure it has a unique Gibbs-equilibrium state μf (see Section 2 for more details). The purpose of this article is to analyse the behaviour, as t →∞, of the Gibbs-equilibrium states μtf of tf . It will be shown that the family (μtf )t 1 ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1709.07346 شماره
صفحات -
تاریخ انتشار 2017